Amazon Bedrock のカスタムモデルインポート機能を試してみる

#Amazon Bedrock
#AWS
たかくに
2024.12.23
こんにちは！AWS 事業本部コンサルティング部のたかくに（@takakuni_）です。
かなり前ですが Amazon Bedrock のカスタムモデルインポート機能が GA となりました。
https://aws.amazon.com/jp/about-aws/whats-new/2024/10/amazon-bedrock-custom-model-import/
これにより、既存で利用可能な基礎モデルに加えて、ファインチューニングされたモデルを Bedrock で使えるようになりました。
 カスタムモデルインポート機能カスタムモデルインポート機能は名前の通り、SageMaker や Hugging Face など他のプラットフォームでトレーニングされたモデルをインポートできる機能です。
Amazon Bedrock のカスタムモデルに取り込むことで、比較的お財布に優しめな推論サーバーができあがります。
カスタムモデルインポートでは現在、以下のアーキテクチャに対応しています。
Mistral
Mixtral
Flan
Llama（v.2、3、3.1、3.2）
IBM Granite（公式ドキュメントには明記されていないものの AWS ドキュメントにて紹介あり）
https://docs.aws.amazon.com/bedrock/latest/userguide/model-customization-import-model.html#model-customization-import-model-architecture
!インポートされたモデルの重みファイルのサイズは、マルチモーダルモデルの場合は 100GB 未満、テキストモデルの場合は 200GB 未満である必要があります。
また、今回の GA によって、 Amazon Bedrock Knowledge bases / Agents / Guardrails への統合や、 Converse API でもサポートされました。
Integration with Amazon Bedrock Features: Imported custom models can be seamlessly integrated with the native tools and features of Amazon Bedrock, such as Knowledge Bases, Guardrails, Agents, and Model Evaluation. This unified experience enables developers to use the same tooling and workflows across both base FMs and imported custom models.
Leverage Amazon Bedrock converse API: Amazon Custom Model Import allows our customers to use their supported fine-tuned models with Amazon Bedrock Converse API which simplifies and unifies the access to the models.
https://aws.amazon.com/blogs/machine-learning/amazon-bedrock-custom-model-import-now-generally-available/
https://docs.aws.amazon.com/bedrock/latest/userguide/custom-model-import-code-samples-converse.html
日本語 LLM と呼ばれる分野の LLM を使いながら、従来の Bedrock の機能を使えるのはとってもアツいですね。
ただし、現在カスタムモデルインポート機能はバージニアリージョンと、オレゴンリージョンのみサポートです。東京リージョンのサポートも待ち遠しいですね。
Amazon Bedrock Custom Model Import is available in the US East (N. Virginia) and US West (Oregon) AWS Regions only.
https://docs.aws.amazon.com/bedrock/latest/userguide/model-customization-import-model.html
 料金従来の基礎モデルのように、入出力あたりのトークン数に基づいた課金ではないです。
図にすると、このようなイメージでしょうか。
Custom Model は Custom Model Unit（以後、CMU）と CMU あたりの月間ストレージコストの 2 軸で課金されます。（モデルのインポートジョブ自体には費用は発生しません。）
CMU は 5 分単位で課金され、執筆時点では v1 の場合、 $0.0785/毎分の課金のようです。（最新情報は料金ページをご覧ください）
モデルのアーキテクチャごとに Custom Model Unit は異なり、インポート時に Custom Model Unit が決定されるようです。
参考までに、Mistral 7B 32K モデルは 1 カスタムモデルユニット、Llama 3.2 11B 128K モデルは、4 つのカスタムモデルユニットが必要と記載されていますね。
 MistralThe Custom Model Units needed to host a model depend on a variety of factors - notably the model architecture, model parameter count, and context length. The exact number of Custom Model Units needed will be determined at the time of import. For reference, Mistral 7B 32K model requires 1 Custom Model Unit.
 Multimodel LlamaThe Custom Model Units needed to host a model depend on a variety of factors - notably the model architecture, model parameter count, and context length. The exact number of Custom Model Units needed will be determined at the time of import. For reference, Llama 3.2 11B 128K model requires 4 Custom Model Units.
デフォルトでは 0 から 3 までで、モデルコピー（ノード）のスケールが行われる部分、良いですね。
On-Demand Inference Pricing:

You are billed in 5-minute windows for the duration your model copy is active starting from the first successful invocation. The maximum throughput and concurrency limit per model copy depends on factors such as input/output token mix, hardware type, model size, architecture, inference optimizations, and is determined during the model import workflow.
Bedrock automatically scales the number of model copies depending on your usage patterns. If there are no invocations for a 5-minute period, Bedrock will scale down to zero and scale back up when you invoke your model. While scaling back up, you may experience a cold-start duration (in tens of seconds) depending on model size. Bedrock also scales up the number of model copies if your inference volume consistently exceeds the concurrency limits of a single model copy. Note: There is a default maximum of 3 model copies per account per imported model that can be increased through Service Quotas.
https://aws.amazon.com/bedrock/pricing/?nc1=h_ls
 やってみた今回は KARAKURI 社の karakuri-ai/karakuri-lm-8x7b-chat-v0.1 をインポートしてみようと思います。
https://huggingface.co/karakuri-ai/karakuri-lm-8x7b-chat-v0.1
 Hugging FaceHugging Face からモデルのダウンロードを行います。
git-lfs をインストールしていない場合は事前に済ませておきましょう。
# Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install

# モデルのダウンロード
git clone https://huggingface.co/karakuri-ai/karakuri-lm-8x7b-chat-v0.1
ネットワークの状況によって変わってきますが、20 分程度ほどでダウンロードできました。
 chat_template の変更Converse API に対応させたいため tokenizer_config.json の chat_template を編集します。
tokenizer_config.json
{
	"add_bos_token": true,
	"add_eos_token": false,
	"added_tokens_decoder": {
		"0": {
			"content": "<unk>",
			"lstrip": false,
			"normalized": false,
			"rstrip": false,
			"single_word": false,
			"special": true
		},
		"1": {
			"content": "<s>",
			"lstrip": false,
			"normalized": false,
			"rstrip": false,
			"single_word": false,
			"special": true
		},
		"2": {
			"content": "</s>",
			"lstrip": false,
			"normalized": false,
			"rstrip": false,
			"single_word": false,
			"special": true
		}
	},
	"additional_special_tokens": [],
	"bos_token": "<s>",
+	"chat_template": "{%- if messages[0]['role'] == 'system' %}\n    {%- set system_message = messages[0]['content'] %}\n    {%- set loop_messages = messages[1:] %}\n{%- else %}\n    {%- set loop_messages = messages %}\n{%- endif %}\n\n{{- bos_token }}\n{%- for message in loop_messages %}\n    {%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}\n        {{- raise_exception('After the optional system message, conversation roles must alternate user/assistant/user/assistant/...') }}\n    {%- endif %}\n    {%- if message['role'] == 'user' %}\n        {%- if loop.first and system_message is defined %}\n            {{- ' [INST] ' + system_message + '\\n\\n' + message['content'] + ' [/INST]' }}\n        {%- else %}\n            {{- ' [INST] ' + message['content'] + ' [/INST]' }}\n        {%- endif %}\n    {%- elif message['role'] == 'assistant' %}\n        {{- ' ' + message['content'] + eos_token}}\n    {%- else %}\n        {{- raise_exception('Only user and assistant roles are supported, with the exception of an initial optional system message!') }}\n    {%- endif %}\n{%- endfor %}\n",
-       "chat_template": "{{ bos_token }}{% if messages[0]['role'] == 'system' %}{% set loop_messages = messages[1:] %}{% set system_message = messages[0]['content'] %}{% else %}{% set loop_messages = messages %}{% set system_message = false %}{% endif %}{% for message in loop_messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if loop.index0 == 0 and system_message != false %}{% set content = '<<SYS>>\\n' + system_message + '\\n<</SYS>>\\n\\n' + message['content'] %}{% else %}{% set content = message['content'] %}{% endif %}{% if message['role'] == 'user' %}{% set helpfulness = message['helpfulness']|string or '4' %}{% set correctness = message['correctness']|string or '4' %}{% set coherence = message['coherence']|string or '4' %}{% set complexity = message['complexity']|string or '4' %}{% set verbosity = message['verbosity']|string or '4' %}{% set quality = message['quality']|string or '4' %}{% set toxicity = message['toxicity']|string or '0' %}{% set humor = message['humor']|string or '0' %}{% set creativity = message['creativity']|string or '0' %}{{ '[INST] ' + content + ' [ATTR] helpfulness: ' + helpfulness + ' correctness: ' + correctness + ' coherence: ' + coherence + ' complexity: ' + complexity + ' verbosity: ' + verbosity + ' quality: ' + quality + ' toxicity: ' + toxicity + ' humor: ' + humor + ' creativity: ' + creativity + ' [/ATTR] [/INST]' }}{% elif message['role'] == 'assistant' %}{{ content + eos_token }}{% endif %}{% endfor %}",
	"clean_up_tokenization_spaces": false,
	"eos_token": "</s>",
	"legacy": true,
	"model_max_length": 1000000000000000019884624838656,
	"pad_token": null,
	"sp_model_kwargs": {},
	"spaces_between_special_tokens": false,
	"tokenizer_class": "LlamaTokenizer",
	"unk_token": "<unk>",
	"use_default_system_prompt": false
}
ベースモデルによって指定方法が異なるため注意です。
https://docs.aws.amazon.com/bedrock/latest/userguide/custom-model-import-code-samples-converse.html
 S3 への Syncカスタムモデルインポートを S3 越しにインポートを行います。
そのため、事前に S3 バケットを作成し、ローカルに落としたモデルを S3 にアップロードします。
なお、S3 バケットと、カスタムモデルインポートの利用先は同一リージョンである必要があります。（約 175GB のアップロードに 20,30 分かかりました。）
# model-import-123456789012 のバケット名を適宜変更する
aws s3 sync karakuri-lm-8x7b-chat-v0.1/ s3://model-import-XXXXXXXXXXXX/karakuri-lm-8x7b-chat-v0.1
 モデルインポートモデルのインポートを行います。
マネジメントコンソールから S3 バケットを選択しインポートジョブを作成します。
ジョブが完了しました。どうやらジョブの経過時間を確認できますね。
今後に活かせそうです。
 モデルの実行プレイグランドでモデルを実行してみましょう。
体感ではコールドスタートで数分待ちましたが、モデルが実行できていますね。素晴らしい。
 コードから実行コードからモデルを実行してみましょう。Converse API が叩けているのが素晴らしいですよね。
app.py
import boto3
from botocore.config import Config

config = Config(
    retries={
        'max_attempts': 10,
        'mode': 'standard'
    }
)

client = boto3.client("bedrock-runtime", region_name="us-east-1")

model_id = 'arn:aws:bedrock:us-east-1:123456789012:imported-model/XXXXXXXXXX'

prompt = "こんにちは！あなたの名前は何ですか？"

try:
    streaming_response = client.converse_stream(
        modelId=model_id,
        messages=[
            {
                "role": "user",
                "content": [{"text": prompt}],
            }
        ],
        inferenceConfig={"maxTokens": 512, "temperature": 1.0, "topP": 0.9},
    )

    for chunk in streaming_response["stream"]:
        if "contentBlockDelta" in chunk:
            text = chunk["contentBlockDelta"]["delta"]["text"]
            print(text, end="")
except Exception as e:
    print(e)
    print(e.__repr__())
実行結果が返信きていますね。素晴らしいです。
(custom-model-import-py3.12) takakuni@ app % python converse.py
 こんにちは！私はKARAKURI LMといいます。カラクリ株式会社によって開発された大規模言語モデルです。どんな質問でもお答えできるので、みなさんのお役に立てれば幸いです。%
リトライ処理の制御は以下のドキュメントを参考にさせていただきました。
https://docs.aws.amazon.com/bedrock/latest/userguide/invoke-imported-model.html#handle-model-not-ready-exception
 まとめ以上、「Amazon Bedrock のカスタムモデルインポート機能を試してみる」でした。
とっかかりの部分、多少時間がかかりますが Bedrock でサポートされていないモデルをインポートできるのは非常に便利ですよね。
このブログがどなたかの参考になれば幸いです。
AWS 事業本部コンサルティング部のたかくに（@takakuni_）でした！
Amazon Bedrock のカスタムモデルインポート機能を試してみる

カスタムモデルインポート機能

料金

Mistral

Multimodel Llama

やってみた

Hugging Face

chat_template の変更

S3 への Sync

モデルインポート

モデルの実行

コードから実行

まとめ

関連記事

主なカテゴリ

AWSで探す

注目のテーマ

プロダクトやサービスで探す

特集やシリーズから探す

お問い合わせ

運営会社